Khader M., Awajan A. Al-Naymat G., 2018. The Effects of Natural Language Processing on Big Data Analysis. Proceedings of the 19th International Arab Conference on Information Technology (ACIT'2018). 28-30 November 2018. Beirut, Lebanon.

Abstract

The social networks are one of the main sources of big data. Continuously, it produce huge volume of variety types of data at high velocity rates. This huge volume of data contains valuable information that requires efficient and scalable analysis techniques to be extracted. Hadoop/MapReduce is considered the most suitable framework for handling big data because of its scalability, reliability and simplicity. One of the basic applications to extract valuable information from data is the sentiment analysis. The sentiment analysis studies peoples’ opinion by classifying their written text into positive or negative polarity.

In this work, a sentiment analysis method for analyzing a Twitter data set is analyzed. The method uses the Naïve Bayes algorithm for classifying the text into positive and negative polarity. Several linguistic and NLP preprocessing techniques were applied on the data set. The aim of these preprocessing techniques is to study their effects on the quality of big data classification. The applied preprocessing techniques have achieved an enhancement in the classification accuracy of the Naïve Bayes algorithm. The experiments prove that the performance of the sentiment analysis is enhanced by 5% using NLP and linguistic processing, yielding an accuracy of 73% on the used data set.